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Abstract 

A sequential importance sampling algorithm is developed for the distribution that 
results when a matrix of independent, but not identically distributed, Bernoulli random 
variables is conditioned on a given sequence of row and column sums. This conditional 
distribution arises in a variety of applications and includes as a special case the uniform 
distribution over zero-one tables with specified margins. The algorithm uses dynamic 
programming to combine hard margin constraints, combinatorial approximations, and 
additional non-uniform weighting in a principled way to give state-of-the-art results. 

Keywords: bipartite graph, conditional inference, permanent, Rasch model, uniform dis- 
tribution 

1 Introduction 

Let O* denote the set ofmxn binary matrices with row sums r = (ri, . . . , r m ) and column 
sums c = (ci, . . . , c n ), and let w = (wij) G [0, oo) mxn be a given nonnegative matrix. Define 
the distribution P* on {0, l} mxn via 

ij zeQ* ij 

where 1 is the indicator function and where we assume k > 0. P* is the conditional dis- 
tribution of an m x n array of independent Bernoulli random variables, say B = (-£%), 
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with ¥(Bij = 1) = Wij/(1 + Wij) given the margins r and e, where P denotes probability. 
This paper describes an importance sampling algorithm that can be used for Monte Carlo 
approximation of probabilities and expectations under P* and also for Monte Carlo approx- 
imation of k. After a preprocessing step, sampling from our proposal distribution requires 
0(md) operations per matrix, where d = £ ■ c j = £j r i * s the total number of ones in the 
matrix. 

We are not aware of any existing importance sampling algorithms that permit practical 
inference under P*, although many special cases have been studied in the literature. For 
example, if id = 1 then P* is the uniform distribution over zero-one tables with specified 
margins, or equivalently, the uniform distribution over bipartite graphs with specified degree 
sequence. For square matrices, if w is identically one except with a zero diagonal, then P* 
corresponds to the uniform distribution over directed graphs with specified degrees. And 
if r = c = 1, then P* is a distribution over weighted permutation matrices and k is the 
permanent of w. Empirically, our algorithm outperforms all existing importance sampling 
algorithms in these special cases. Although our algorithm works well for most examples 
arising in practice, performance depends on r, c, and w. Highly irregular margins or highly 
variable w, particularly many zero entries in w, tend to cause poor performance. 

P* factors in such a way that we need only focus on the distribution, say P, of the first 
column. The columns are sampled sequentially, with each successive column viewed as the 
first column of a smaller matrix with updated margins based on the previously sampled 
columns. We decompose the structure of P into margin constraints, combinatorial factors, 
and non-uniform weighting terms, combine approximations of these terms in a principled 
way, and then use a dynamic programming algorithm to exactly and efficiently sample from 
the resulting proposal distribution Q for the first column. Sequentially sampling columns 
in this way defines a proposal distribution Q* for the whole matrix. This strategy for 
algorithm design works well for many similar problems, including symmetric matrices and 
nonnegative integer-valued matrices, each of which will be described elsewhere owing to 
space constraints. It seems likely that the design principles used for our approach are 
applicable much more broadly. 

2 Motivating applications 

2.1 Conditional inference for graphs and tables 

Let B £ {0, l} mxn be a matrix of independent Bernoulli random variables with 

logit FiBij = 1) = Oi + & + £* e^ kij , (2) 

where a, (3, and 6 are parameters, perhaps with constraints to ensure identifiability, and £ 
is a collection of observed covariates. Models of this form arise, for example, in educational 
testing, where Bij indicates whether or not subject i respo nded correct l y to q uestion j. If 



6 = 0, then the model reduces to the classical Rasch model (jRaschl . 1 1 9601 . 1 1 96 ll ) . Otherwise, 
it is an extension of the Rasch model to include item-specific covariate effects, such as each 
subject's prior exposure to the content being tested in each question . This model is also a 



simp l e version of models u s ed for the analysis of ne twork data (c.f.. iHolland fe Leinhardt . 



198ll : iFienberg etail Il985l : iGoldenberg et all l201()h where B is the adjacency matrix of a 



directed graph, a and (3 allow for degree heterogeneity, and £ is a collection of edge-specific 
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covariates. For example, for social network data we might have that Bij indicates whether 
subject i reported subject j as a friend, aj controls the relative propensity for subject i to 
report friends, j3j controls the relative propensity for subject j to be reported as a friend, 
and indicates whether the relationship between subject i and j is of type k. 

In both of these examples, if 6 is the only parameter of interest, then the nuisance 
parameters a and (3 complic ate inference and can be removed by conditioning on the row 
and column sums of B (e.g., ICaxL Il958l : iHolland fc Leinhardtl Il98ll : iMehta fc Patel 1199.4 



Harrison! . l2012h . Conditioning also results in inferential procedures that are robust to 



modeling assumptions, implicit in ([2j), about the distribution of the margins. The resulting 
conditional model that drives inference is exactly P* = Pg with Wij = exp(^ fc OkCkij), 
perhaps with the additional constraint that wu = in the case of network data. The 
conditional model is a natural exponential family in 6 with no nu isance parameters, but 



with an intractable normalization constant K = n e . l5a7r^ « Example 4.2) provides 
details about an importance sampling approach to exact conditional inference for this model. 
The example there is based on a preliminary version of the algorithm presented here. 

Other approaches to conditional inference in this setting include e xhaustive enumera tion, 
such as t he algorithm s for conditional logistic regression in Stata (|StataCorp| . 120091 ) and 
LogXact rtCvtell201ch . Markov chain Monte Carlo approaches, such as the elrm R package 
( Zamar et al. . 2007), and a nalyt ic approximations, such as the cond R package ( Brazzalel 
2005 ; Brazzale &: Davison . 20081 ). none of which are practical for larg er matrices and/or 



multivariate 0. Approximation of k was considered in iBarvinokl (|2010bl ) 



2.2 The uniform distribution and model validation 

If w = 1, or more generally, if Wij = exp(aj+/3j) for real-valued a and (3, then P* is the uni- 
form distribution on O*. The uniform distribution can be used for testing if = in model 
([2]), or equivalently, for model validation of ([2]) specialized to t he case of = 0. Prominent 
examples include goodness - of-fit tests for the Rasch model (e.g-. lRasch . 1960, 19611 : iPonocnyl . 
200ll : lchen & Small 120051 : IChen et all 120051) and fo r random bipartite graphs and directed 



graph s without reciprocity (e.g., IWassermanl . 119771 : IHolland Sz Leinhardtl . Il98ll : ISnijdersl . 



19911 ). 

The uniform distribution over £1* also plays a central role in testing for the presence 
of interactions in co-occurrence tables, particularly co-occurre nce tables arising in ecology , 



Sniiders, 


1991; 


Gotelli. 


2000; 


Chen et al.. 



subject to the margin totals is taken as a null hypothesis of no interaction among species 
(without necessarily assuming model ©). Our original motivation for developing these 
algorithms came from a similar problem in neuroscience where B^ indicated whether neuron 
i produced an action potential in time bin j, and the uniform distribution was taken as a null 
hypothesis of a lack of interaction among the neurons. This can be viewed as an example 
of conditional testing for multivariate binary time series. In that example, mn ~ 10 8 and 
existing algorithms in the literature were not practical. 

Besides the many statistical applications, when w = 1, the normalization constant k is 
the number of binary matri ces with specified ma r gins, a topic of enduring interest in theoret 



ical computer science (e.g.. iKannan et ak l.ll999l:ljerrum et all liool :lBezakova et all 120071) 
and combinatorial approximation (e.g.. iBekessv et all 19721 ; iMcKav . 1984 ; Greenhill et al. . 
20061 : ICanfield et all . I2OO8I : IBarvinokl . EoiOal ). Importance sampling algorithms for P* can 
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be used to provide efficient approximations of k (|Blanchetl . l20()9h . 

Monte Carlo sampli ng algorithms fo r the uniform distribution have been developed 



by many authors (e.g., Besag k. Clifford L 19891: McKav k. Worma 



2008 



Rao et al.l.fl996l:IChen et al.l . l2005l : lBlanchetl . l2009l : lBezakova et al.l . l2007l : IChenl . l2007l : lverhelstl . 



dl. 1199(1: IS niiders. 11991 



Bavati et all l201Gh. The approach here was inspired by the importance sampling al- 



gorithm in IChen et al.1 (|2005h . but provides a more principled method for algorithm design 
that leads to substantial improvements in the uniform case and that also extends to the 
non- uniform case. 



2.3 Permanents and permanental processes 

If w is square and r = c = 1, that is, if 0* is the set of permutation matrices, t hen n is 
the permanent of w , also of enduring interest in theoretical computer science (e.g.. IValiantl . 



19791 : iJerrum et all 12004! ) . A variety of generalizations of permanents and determinants 
can be expressed as K/i, where \i = E(h(Z)) for some functi o n h, where E denotes expected 
value, and where 
Diaconis k Evans 



Z has distribution P* (e.g., iLittlewoodl . Il950l : IVere-.Tonesl . Il988l . Il997l : 
In principle, the algorithms here could be used to approximate 



the value of any of these objects, but the practicality of t his approach depends heavily on 
h. For example, the a-permanent ( Vere- Jonesl . 1988 . 19971 ) 



is 



(3) 



where a £ M and cyc(z) is the number of disjoint cycles in the permutation correspond- 
ing to z. The case a = 1 corresponds to the permanent of w, and the case a = — 1 
corresponds to (— l) n times the determinant of w. Permanents and a-permanents arise 
in probability, statistics , and s t atistical physics in connection to permanental processes 
and random fields (e. g., Macchi . 1975 : Diaconis &i Evans . 2000; Shirai $z Takahashi . 20031 : 



McCullagh MollelEood lKou k Mc Cullagh. 



tics ( Vaughan Venablesl . 19721 : Bapat & Be 
approximating (|3|) when a > and | loga| is small 




and the distribution of order statis- 
Our approach is often effective for 



3 Algorithm design 

3.1 The target distribution for the first column 

For a matrix z = (z{j) we use z^ to denote the j'th column of z, we use z^ :k to denote 
the submatrix formed from columns j, . . . ,k, we use R(z) = {Ri{z)) to denote the column 
vector of row sums defined by Ri = ^ ■ , and we use C(z) = (Cj(z)) to denote the row 
vector of column sums defined by Cj = ^>2iZij. Fix the size of the matrix, mxn, the 
weights w, and the margins, r and c, and let Z have distribution P* defined in (pQ) with 
n* = {z e {0, l} mxn : R(z) = r, C{z) = c}. 

To sample from P* we need only design a generic algorithm (generic in m, n, r, c, w) for 
sampling from the distribution P of the first column, namely, 

P(x) = P(Z* = x). 

The reason is that the conditional distribution of Z 2m given Z 1 has the same form as P* in 
(PQ), but with different parameters. The size of the matrix is now m x (n — 1), the row sums 
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are updated to r — R^ 1 ), the column sums are updated to c 2;n , and the weight matrix 
is updated to w 2:n . Once we have sampled Z 1 , then we can update these parameters and 
effectively start over, treating the second column of Z like it was the first column of the new, 
updated problem, and then continuing sequentially until we have sam pled the entire matrix. 
This is the same sequential strategy suggested by IChen et al. (|2005h . The supplementary 



^87 



Y 1 = x 



material (located at the end of this document) contains a more detailed description of this 
column-wise factorization. 

Henceforth, our target distribution is P, the distribution of Z . Let Y be a random 
matrix chosen uniformly over Q*, let O denote the support of Y 1 , namely, 

Q = {x G {0, l} mxl : x = z\ z G Q*}, 

and for x G f2 define 

U(x) = ¥(Y 1 = x), V(x)=E{J[ 

ij 

It is straightforward to verify that 

P(x) oc U(x)V(x)t{x G n} (4) 

for x G {0, l} mxl . This factorization conceptually isolates the hard margin constraints, il, 
the combinatorics, U, and the non-uniform weighting, V . Although the separation is clearly 
artificial, it is useful to treat each of these factors separately when developing a proposal 
distribution. 

3.2 The proposal distribution for the first column 

Motivated by the factorization in ([4]), we consider proposal distributions for the first column 
of the form 

Q(x) oc U(x)V(x)t{x G Cl}, 
where U and V are approximations of U and V, respectively, that factor according to 

m m 

U{x)oiJ{uf, V(x)o^l[v^ 

i=i i=i 

for some u, v G [0, oo) mxl , and where Cl is of the form 

n = {x G {0, l} mxl : G A, ELl ^ € B ia i = 1, . . . , m} (5) 

for some permutation 7r = (tti, . . . , 7r m ) of (1, ... , m) and some subsets .4. = 4.1 x • • • x ^4 m C 
{0, l} m and B = B\ X • • • X B m C {0, 1, . . . , ci} m . Combining these approximations creates 
a proposal distribution of the form 

m 

Q(x) ocfju^f'l^ g A, ELi^ e ^} e {0,l} mxl ). (6) 

Any proposal distribution of this form permits fast, exact sampling and evaluation using 
0{mc\) operations; see Section ET3l The challenge is to find easily computable choices of 
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u, v, 7r, A., and B such that Q is a good approximation to the target P. Fortunately, this 
seems to be possible in many cases; see Section HI 

For importance sampling to work, the support of Q, which is a subset of Q, must contain 
the support of P, which is a subset of Q. When w has no zero entries, we require u and 
v to be positive and we engineer Vt to exactly coincide with SI so that P and Q both have 
support fi; see Section [4.11 When w does have zero entries, we modify v and fi to exclude 
certain elements in 0, but only elements that are not in the support of P. This ensures that 
the support of Q contains the support of P, but the supports may no longer be identical. 
In this case, if the importance sampling algorithm generates a column that is not in the 
support of P, then as it sequentially generates additional columns it will eventually try to 
create a Q that is identically zero, indicating that no assignment of the current column 
simultaneously satisfies the margin constraints and has positive weight. At this point the 
algorithm can assign an importance weight of zero and terminate. Certain patterns of zero 
weights make the algorithm highly inefficient because the algorithm rarely terminates with 
a nonzero importance weight. In the supplementary material we discuss alternative choices 
of v and $7 that are more efficient for certain patterns of particular interest, including the 
special case of zeros only on the diagonal. 



3.3 Efficient sampling and evaluation of the proposal 

Let X E {0, l} mxl have a distribution Q that factors according to (|6|) above for some 
u, v, 7r, A, and B. Define the permuted partial sums S G {0,... ,ci} mxl according to 
Si = Y2e=i Xttc for each i, and note that X and S are in bijective correspondence. The 
distribution of S factors according to 

m 

F(S = 8)xY[hi{8 i - 1 ,S i ) (7) 

1=1 

for hi(si-i, Si) = Sl ~ 1 vt\ St ~ 1 t{si — Sj_i G Ai, Sj G Bi}, where here and below we define 
So = sq = for notational convenience. 

The factorization in (J7]) implies that S is a Markov chain. If we were given the standard 
Markov chain representation 

m 

F(S = s) = Y[¥(Si = Si\S i - 1 = Si- 1 ), (8) 
i=\ 

then generating a random observation of S would be trivial. It is known that dynamic 
programming can be used to convert from Gibbs r andom field representations like ([7]) into 
Bayesian network representa tions like (El); see, e.g ., Frevl (1998). The next theorem, which is 
straightforward to verify (cf. Harrison &: Geman , 20091 ). summarizes dynamic programming 
in this context. 

Theorem 1. Let (Sq, Si, . . . , S m ) be a sequence of random variables where each Si takes 
values in the finite set Di and where Dq = {0}. Suppose there exists a sequence of functions 
hi : -Dj-i x Di i— )■ [0, oo) for i = 1, . . . , m such that the distribution of (Si, . . . , S m ) can be 
expressed as 



°(Si = sx, ■ ■ ■ , S m = s m ) oc Y[ hi(si-!, Si). 



i=i 



6 



Recursively define g m (s m -i,s m ) = h m (s m -i,s m ) and 

9i(si-i,Si) = hi(si-l,Si) E 9i+l( s h s i+l) (i = 1, • • • , m - 1), 

where each gi is defined over x D^. Then So, . . . , S m is a Markov chain and 
F(Si = aASi-i = Sj_i) = ^lllzllll} (i = l,...,m). 

In the present context, Si G Bi C {0, . . . , ci} for each i, so the algorithm described in 
Theorem [I] for converting from ([7]) to © requires at most 0(mc\) operations. In fact, since 
in the present situation we have hi{si-\,Si) = for Sj — G" {0,1}, implying the same 
for gi, this yields an algorithm that requires 0(mci) operations. Instead of representing 
all (ci + l) 2 combinations of (sj_i, Sj), we represent only the 2c\ + 1 feasible combinations. 
Once the representation in ([8]) is computed, generating a random observation X from Q or 
evaluating Q(x) at any a; takes 0(m) operations. 



4 Specification of components 

4.1 Margin constraints 

Here we discuss the construction of fi. In particular, the next theorem shows how to ensure 
that f2 = O for easily computable choices of 7r, .4., and B. 



Theorem 2. k^sn et all \200A ) Assume Q* ^ 0. Choose tt so that r ni > • • • > r„ 



For 



-4/ 



/or 6j = ^ 



, m, define 

'W (^=0); 
{0,1} (0<r^<n); 
,{!} (r ffi = n), 



Bi 



{max{0, bi}, . . . ,c\} (i < m) 
{ci} (i = m) 



=1V 7T< 



Sj=2 l{cj > •£})■ Define £1 according to ([5]). T/ien O = O. 



It is instructive to see how these choices of tt, A., and B ensure that Ct 5 which 
is th e primary req uirement for importance sampling. The Gale-Ryser conditions (jGald . 
19571 : iRyserl . Il957l ) state that there is a binary matrix with margins r G {0, . . . , n} mxl and 
c G {0, . . . , m} lxn if and only if ^ rj = ^T,j c j an d 



for all i = 1, . . . , m — 1, 



where the permutation 7r is chosen so that r ni > ■ ■ ■ > r nm . It is straightforward to see 
that x G {0, l} mxl will be in f2 exactly when there is a way to fill out the remaining n — 1 
columns of the binary matrix that obey the updated margins after accounting for x. In 
other words, x G exactly when r — x and c 2:n satisfy the Gale-Ryser conditions for the 
margins of an m x (n — 1) binary matrix. 

The set A. is chosen so that x G A. if and only if r — a; G {0, . . . ,n — l} mxl . The 
set i3 m is chosen so that YliLi x i if and only if YliLi r i ~ x i = Sj=2 c i- This is 
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equivalent to enforcing the column sum Y^iLi x i = c i- Choosing the permutation cj) so that 
T(j, x — Xfa > ■ ■ ■ > r^ m — x^ m , the remaining Gale-Ryser conditions are that 

% i 

j>&-s&)<i^l{c7>*} for alii = l,...,m-l, (9) 

1=1 1=1 j>2 

which implies that 

i i 

5^(rv, - x Vl ) <J2Y1 ^ - ^ for alii = 1, . . . , m - 1, (10) 

i=\ e=i j>2 

since the permutation 4> makes the left side as large as possible. Solving (fTUj) for Y^i=i 
gives the bounds encoded in the £>j and shows that S7 3 

We c annot use the perm utation <j> in the construction of Q because <p depends on x, 
however, Chen et al. ( 20051 ) further prove that ([9]) and (|10p are in fact equivalent, which 



means we are in the ideal situation where f2 = 0,. (Although they made use of the factor- 
ization in Theorem [21 their proposal distributions were not of the for m in fl6), exc ept in the 
special case where $7 = {x G {0, l} mxl ; Y^i x i = c i}-) Furthermore. IChenl ( 20071 ) provides 



an extension of Theorem [2] for the case where, in addition to the margin constraints, also 
enforces a fixed pattern of structural zeros for which there is at most one structural zero 
in each row and column. This includes the important special case of adjacency matrices of 
directed graphs; see supplementary material. 

4.2 Combinatorial approximations 

Here we discuss approximation of U by U. Define 

N m , n (r, c) = \{z G {0, l} mxn : R(z) = r, C(z) = c}\ 
to be the number ofmxn binary matrices with row sums r and column sums c. We have 

(where as before, Y is uniform over f2*) and we desire an approximation of the form 

u{x)= iw J E U " () 

where 7 is an irrelevant positive constant. 

Temporarily pretending that ([lip is accurate for any x G {0, l} mxl ; we have 

U(P) = N m ^ L {r-I\c 2 -- n ) 
Ul ~ 17(0) N m ^{r,c^) ' 

where J* is the ith column of the m x m identity matrix i". We cannot use this directly, 
since it is trying to evaluate U outside of and, furthermore, computationally efficient 
procedures for evaluating N n ^ n _i are not available. Nevertheless, it suggests using 

u i = N m ^ 1 (r-I\c 2:n )/N m ^ 1 (r,c 2:n ) (t = l,...,m) (12) 
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for an approximation N of N that extends smoothly to invalid margins. 

Several asymptotic a pproximations for iV h ave appeared in the literature and could be 
used for N. For example, Canfield et al. ( 20081 . Theorem 1) suggest the following, which we 
write asymmetrically with respect to r and c in order to simplify ()13j) below: 



N m Jr,c) 



mn 

V— Ml 



-1 m 



i=l 



n : n 



m j exp^-^(l - /U m ,„(r, c)) (1 - i/ m , n (c))), 

2 



Hm,n(r, c) = T? m , n (c) ^ ( r j 5^ Cfc ) ' v m,n{c) = Vm,n{ C )^2[ C 3 z\j 

j=l V m k=l ' j=l ^ n k=l 



(ELi c fc)( mn -£Li c fc)' 

Substituting this into (fT2"j) and simplifying gives 



n — Ti 



exp 



/l 1 n 

77 m , n _i(c 2:n )(l - ^ m .„_i(c 2:n )) - - n + — V c k 



k=2 



(13) 



If r, = or 7-j = n, then the value of Xj is determined by 0, and any choice of Uj > gives 
the same Q; we use Ui = 1 in these cases. We find that (|13|) works well over a large range of 
margins when P* is uniform. It is excellent if the margins are approximately semi-regular, 
that is, if the row and column sums do not deviate substantially from their respective mean 
values. 



Fo r certain pathological cases with wildly varying margins, such as those in lBezakova et al 
(2006), ( [T3|) does not work well. However, if the margins are such that the resulting ma- 
trices have a very low density of ones, even if the margins are highly irregular, then good 
performance can be o btained by instead using the asymptotic approximation of N from 
Greenhill et ail tod . Theorem 1.3). Details are provided in the su pplementary material. 
In fact, for the specific pathological cases in iBezakova et al. (j2006h using this alternative 
approximation gives Q* = P* in the uniform case. None of the computationally efficient 
combinatorial approximations that we have found in literature work well when the margins 
are both highly irregular and lead to a moderate density of ones, but we are hopeful that 
advances in asymptotic enumeration techniques will eventually lead to approximations that 
wor k well in almost a ll cases. 

Chen et al. (j2005l ) observed that combinatorial approximations could be used to find 



a goo d choice of u and they mentione d an early asym ptotic approximation from lO'Neil 
(1969), which was explored further bv iBlanchetl (j2009l ) in an asymptotic analysis of the 
algorithm. The examples in IChen et al.1 (120051 ). however, use Ui = rj/(n — r t ), which is 
motivated by considering only the row margin constraints. Although there are several 
substantial differences between their proposal distribution and ours for the special case of 
the uniform distribution of O*, we suspect that much of the improved performance of our 
algorithm results from using more accurate combinatorial approximations. 

In the next section we use V and V to account for the effects of w, including the effects 
of zeros in w. Since these zeros affect the size of the support of P*, an alternative, perhaps 
more natural approach is to allow U and U to capture the effects of zeros in w. The 
supplementary material contains more details. 
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4.3 Non-uniform weighting 

Here we discuss approximation of V by V. To develop an approximation, we will ignore the 
col umn margins and consider only the row margins. This is similar to the approach used 
by IChen et al.l (|2005h to develop combinatorial approximations for the uniform case. Let 
B G {0, l} mxn be a matrix of independent Bernoulli(l/2) random variables, so that 



V(x) 



E 



(n 

«(n 

m 

n 



Y i3 



Bi. 



E 



B 



13 

x,R(B) = r 



Bi 



B 1 



x,R(B) = r,C(B) = c 



m n 

n«(n 



B44 



i=i 



Ba = Xi,Ri(B) 



(14) 



where 



m\]=iW* ii \Bn = l,Ri(B) = r i ) 



Vi 



1; i = l,...,m) 



i>n 



'j-i 



n Oj — 1 



(15) 



In the supplementary material we describe how to compute all possible v for all columns 
using 0(nd) operations (where d = X^i r i = 12j c j) i n a one-time preprocessing step that 
can be done prior to sampling. As with Ui, we always define vi = 1 if = or rj = n. 
For cases where iu has zeros, we can sometimes have a zero in the denominator of (I15D for 
rj > 0. This happens when fewer than of the n — 1 remaining weights in the row are 
nonzero. Consequently, we need to force Xi = 1, which we do by setting the corresponding 
^ = {1} in Section O 

An important observation that we have thus far neglected is that many different choices 
of w give rise to the same P*. Define 



A(w) = {te [0, oo) mxn : t = a(3 T o w, a G (0, 00 



,mxl 



/3G(0,oor xl }, 



where T denotes transpose and y o z is the Hadamard product, that is, element- wise mul- 



tiplication of matrices of the same size, defined by {y 



o z 



UijZij. Then for every 



t G A(w) it is straightforward to verify that the P* defined with the weight matrix w and 
the P* defined with the weight matrix t are identical. Similarly, the two versions of V 
differ only by an inconsequential constant of proportionality. Unfortunately, our approxi- 
mation V{x) oc V}* defined above does not share this invariance. Consequently, proposal 
distributions constructed with w and t, respectively, could differ, even though the target 
distribution does not differ. We find this unappealing and remedy it in a preprocessing 
step prior to the construction of Q* by first transforming w into an equivalent, canonical 
w G A(to). In particular, w is the unique element of A(w) with the property that its 
average nonzero entry over any row or column is one, namely, 



y^mj = y^A{ w ij > °}> 



i=i 



w 



■1 j 



>0} (i = l,...,m; j = l,...,n). (16) 



10 



The s olution to (|16p over A(w) exists, is unique, and is easy to find numerically (jRothblum fc Schneider . 
19891 ): see supplementary material for details and more discussion. In the examples below, 



we always define v in terms of w, not w. Not only does this ensure that Q* has the same 
invariance property as P*, but we find that performance of the algorithm tends to improve. 

If we know that P* is uniform over f2*, for example, if w = 1 or w = 1, then V is 
constant over f2, and we can ignore v in the construction of Q. 



5 Importance sampling 

5.1 Algorithm summary 

1. Preprocessing: Compute w from (fT6|) and precompute all possible v using (fT5|) with 
w in place of w; see supplementary material for details. Compute 7r, A, and B 
according to Theorem [2] and u according to (|13p for the first column. 

2. Generating a single observation, from Q*: The matrix .Z is generated column-by- 
column as follows. Set q = 1. Sequentially, for each column: 

(a) For the current m, n, r, c, compute 7r, .4., 13, u as above. After the preprocessing, 
updating these quantities based on the previously sampled column requires 0{m) 
operations. 

(b) Use Theorem [1] to compute the Markov chain representation for Q. For the jth 
column, this takes 0{mcj) operations. If Q = 0, then Z will not be in the 
support of P*; assign a final importance weight of zero and go to stepO 

(c) Generate a random observation X from Q and evaluate Q(X). This takes 0{m) 
operations. 

(d) Assign the current column of Z to be X. Update q <— qQ(X), c <— c 2;n , 
n <— n— 1, r <— r — X. If n > 0, continue looping over columns; go to step [2a] 
If n = 0, the final matrix is Z, and we have Q*{Z) = q; go to step El 

3. To generate additional independent observations from Q*, reset all variables to their 
original values after step Q] and repeat step [2j 

The same algorithm can be used to evaluate Q*{z) for any zeff. Simply assign X to be 
the current column of z in step [2cl instead of sampling a new column. (The algorithm can 
be applied for any ordering of the columns, and Q* will depend on the chosen ordering. The 
supplementary material describes the heuristics that we use to choose a column ordering.) 



5.2 Monte Carlo approximation and diagnostics 

Let Z have distribution P*, let Z\, . . . , Zt be random sample from Q* generated as above, 
and let h be a function over f2*. Define the unnormalized importance weights 

f{z) = ^{z)- Q*{z) (^{0,1} ), 

which we can efficiently evaluate for any z as described above. In the formula for f{z) 
it is important that we use w, not w, particularly if we are approximating k. We can 
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approximate n and fj, = E,(h(Z)) via importance sampling in the usual way, namely, 



ZLif(z t )h(z t 



Ml 



Kt 



T 



t=i 



(T —?■ oo). 



(17) 



Besides being c onsis t ent, kr and kt[it are also unbiased approxima tions of k and k/j,, 
respectively. See^ ffl) for details about importance sampling. See iHarrisonl for 
modifications when (frT|) is used to approximate a p-value. 

In this context, importance sampling algorithms are usually evaluated empirically by 
diagnostics related to the variability of the importance weights. The less variable the 
importance weights, the better the algorithm is judged to be performing. For the numerical 
illustrations below, we report 



cv T 



1 



T 



(T 



kt 



A 7 



t=i 



max t =i,...,r/(Zt) 
mm t =i,...,T f(Z t ) 



1. 



As T — > oo, the approximate squared coefficient of variation, cv T , converges to the true 
squared coefficient of variation, cv 2 = V(/(Zi))/E(/(Zi)) 2 = V(P*(Zi)/Q*(Zi)), where 
V denotes variance. T = T/(l + cv 2 ) has been suggested as a rough diagnostic for effective 
sample size, meaning that a sample size of T from Q* behaves roughly like a sample size of 
T from P* for the purposes o f Monte Carlo approximating [i for well-behaved functions h 



purp 

(jKong et al.1 . 119941 : iLiul . 120011 ) . For many but not all examples we find cVj> < 1, suggesting 
that Q* is appropriate for efficient importance sampling. The relative range of importance 
weights reported by At is an especially stringent diagnostic. For nearly constant mar- 
gins and P* close to uniform, we often find At ~ 0, suggesting that Q* is an excellent 
approximation of P* ; see Table [TJ 



6 Numerical illustrations 

We experiment with four different classes of weights based on a canonical matrix y whose 
entries are independently sampled from the uniform(0, 1) distribution: (I) Wij = 1, which 
is the uniform distribution over O*, (II) w^j = yij + 1, (III) Wij = yij, and (IV) Wij = 
— l(yij < 0.99) log(yjj), for all i,j. The specific entries of y for different sized matrices are 
in the supplementary material. The resulting P* is increasingly non-uniform in each of the 
latter three cases and has 1% structural zeros in case (IV). Recall that each w corresponds 
to a family of weights of the form a(3 T o w that give the same P* and Q*; see Section [4.31 
In all cases we report results with T = 1000. 

We begin with 500 x 500 ri-regular matrices, i.e., = Cj = r\ for all i, j = 1, . . . , 500. 
Results are summarized in Table [1] for r\ = 1,2,4, 8, .. . ,256. The diagnostics are striking, 
especially in the uniform case, for which the importance weights are essentially constant. 
Performance degrades slightly as P* becomes strongly non-uniform, but in all cases the 
estimated cv 2 is less than one. Low variability in importance weights corresponds to high 
precision in estimates of k. For example, in the uniform case, where k = for r\ = 256 we 
obtain k = (1.478301±0. 000044) x 10 73781 , where the errors are approximate standard errors 
estimated from the same importance samples, and for r\ = 2 we obtain kt = (2.27653 ± 
0.0001 7) x 10 2266 , the latter of which is close to the true value of k = 2.27 6 58 . . . x 10 2266 ; see 
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supplementary material. To our knowledge the exact value of k in these examples can only 
be efficiently compute d for the special case of the uniform distribution over either 1-regular 
or 2-regular matrices ( Anand et al. . 19661 ). Sampling from the uniform distribution over 



1-regular matrices is trivial, k = m!, and there is no need to use our algorithm, although it 
is comforting that Q* = P* in this case. 

We remark that the distributions corresponding to different weight classes in Table Q] 
are almost singular with respect to each other. For example, in the 1-regular case, if we use 
the Q* for weight class I as a proposal distribution for the P* corresponding to one of the 
other weight classes, then we obtain, for weig ht class II, At = 2xl0 13 and cv T = 8xl0 2 , 
for weight class III, = 2xl0 67 and cv T = lxlO 3 , and for weight class IV, only 6 of the 
1000 observations from Q* were even in the support of P*, owing to the structural zeros. 
Results are similar for other combinations and become even more extreme as r\ increases. 



Table 1: 500 x 500 ri-regular matrices 
uniform w class II w class III w class IV 



ri 


A T 




CV T 




A T 




CV T 




A T 


cv T 




A T 


CV T 




1 












2x10" 


i 


5x10" 


4 


4x10° 


4x10" 


2 


5X10 1 


3x10" 


i 


2 


4x10" 


-2 


5x10" 


6 


2x10" 


i 


4x10" 


4 


6x10° 


4x10" 


2 


8X10 1 


2x10" 


i 


4 


1x10" 


-2 


1x10" 


6 


1x10" 


i 


4x10" 


4 


5x10° 


3x10" 


2 


2xl0 2 


2x10" 


i 


8 


2x10" 


-2 


1x10" 


6 


2x10" 


i 


3x10" 


4 


3x10° 


3x10" 


2 


4X10 1 


2x10" 


i 


16 


1x10" 


-2 


1x10" 


6 


2x10" 


i 


3x10" 


4 


3x10° 


3x10" 


2 


4X10 1 


1x10" 


i 


32 


8x10" 


-3 


8x10" 


7 


1x10" 


i 


2x10" 


4 


2x10° 


2x10" 


2 


lxlO 1 


1x10" 


i 


64 


9x10" 


-3 


9x10" 


7 


1x10" 


i 


2x10" 


4 


3x10° 


2x10" 


2 


2X10 1 


9x10" 


2 


128 


1x10" 


-2 


9x10" 


7 


1x10" 


i 


9x10" 


5 


1 x 10° 


1x10" 


2 


5x10° 


5x10" 


2 


256 


9x10" 


-3 


9x10" 


7 


5x10" 


2 


5x10" 


5 


1x10° 


1x10" 


2 


9x10° 


7x10" 


2 



For the special case of 1-regular matrices, corresponding to the first row in Table HJ n is 
the permanent of w and various generalizations of the permanent correspond to expecta- 
tions under P*. The current state-of-the-art algorithm for approximating permanents and 
q-permane nts of general matr i ces, s ee ([3]) above, seems to be the importance sampling al- 
gorithm of iKou &: McCullagh (2009), which has about the same computational complexity 
as our algorithm. For the case a = 1, their algorithm is nearly identical to ours, the main 
differences being the choice of column order and our use of w, which seems to give our algo- 
rithm slightly better performance. Their algorithm is generally preferable for a / 1, since 
it is tailored to the specific choice of a, although in many cases performance is comparable. 
The supplementary materials h ave numerical comparison s for each of the w used in Table 
[Hand for all of the examples in Kou Sz McCullagh (2009), which include cases with a = 1 
and a = 1/2. It is interesting that in many cases our generic approach is competitive with 
specialized software. 

In Table [2] we repeat the simulations of Table Q] for 50 x 100 irregular matrices with 
margins r = kr and c = he, for the cases k = 1,...,4, where f T = (24 1 , 22 2 , 17 4 , 13 3 , 
12 2 , ll 3 , 10 2 , 9 3 , 8 6 , 7 1 , 6 4 , 5 4 , 4 5 , 3 6 , 2 4 ) and c = (12 2 , 10 2 , 9 5 , 8 4 , 7 6 , 6 11 , 5 10 , 4 18 , 3 9 , 
2 13 , l 20 ) using V to denote j copies of i. Performance degrades in the irregular case as the 
matrices become more dense. In most, but not all cases, the diagnostics suggest Q* could 
be used for efficient importance sampling. 

For the special case of the uniform distribution, corresponding to the far l eft category 



of wei ghts in Tables Q] and [21 the sequential importance sampling algo rithm of IChen et al 



(2005), as implemented in the publicly available R package networks is (jAdmiraal &: Handcockl . 
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Table 2: 50 x 100 irregular matrices with r = kr, c = kc 





uniform 


w 


class II 


w class III 


w class IV 


k 


A T 


CV T 


A T 




A T 


CV T 




CV T 


1 


4X1CT 1 


lxlCT 3 


3x10° 


5xl(T 2 


8X10 1 


5X1CT 1 


5xl0 3 


3x10° 


2 


3x10° 


3xlCT 2 


7x10° 


lxlO" 1 


7xl0 2 


2x10° 


6xl0 4 


7x10° 


3 


2xl0 2 


7X10" 1 


2xl0 2 


6XKT 1 


2xl0 4 


6x10° 


3xl0 6 


4X10 1 


4 


3xl0 6 


3X10 1 


3xl0 6 


2X10 1 


4xl0 9 


2xl0 2 


2xl0 13 


8xl0 2 



20081 ). appears to be the current state-of-the-art algorithm for practical Monte Carlo ap- 
proximation. Our algorithm is a substantial improvement, especially for dense or irreg- 
ular margins. Using networksis gives At = (lx 10 1 , lx 10 2 , 2x 10 5 , lx 10 11 ) and 6v\ = 
(2xl0 _1 ,6xl0 _1 , Ixl0 1 ,4xl0 2 ) for the first pair of columns in Table [2j The networksis 
implementation is several orders of magnitude slower than our implementation, and is too 
slow for most of the examples in Table [TJ 



Supplementary Material 

Supplementary material includes (i) a more detailed description of the column-wise factor- 
ization described in Section 13.11 (ii) details about the solution to (fT6j) and other prepro- 
cessing of the weights and margins, (iii) alternative combinatorial approximations for sparse 
matrices with irregular margins, (iv) extensions to Theorem[2]for the case of structural zeros 
with at most one structural zero in each row and column, including the case of structural 
zeros along the diagonal, (v) more principled treatments of structural zeros in the approx- 
imations to U and V, (vi) details for the numerical simulations, (vii) additional numerical 
illustrations, including examples using real data, and (viii) a Matlab implementation of the 
algorithm. 



A Column-wise factorization 

Define the set of binary matrices with margins r £ f$ mxl an d c£ J^ lxn to be 
n* m>n (r, c) = {ze {0, l} mx " : R(z) = r, C(z) = c}, 

and let 

N m>n (r,c) = \n* mjn (r,c)\ 

denote the number of such matrices, where N = {0, 1, . . . } denotes the nonnegative integers. 
For an m x n matrix w £ [0, oo) mxn define the function 

t{z € n* „(r,c)j ™ " ... 
Pkn(z I r, c, w) = \ r ^. n [J II (* e {0, 1}—) 

with normalization constant 



m n 

2. 



Km )n (r,c,w) = YlYl 

zen* (r,c) i=ij=i 



1 7 

w ij 
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using the convention that 0/0 — 0. If ftrn,n 

(r,c,w) > 0, then P^ n is a probability mass 
function and we use Z to denote a random binary matrix with this distribution. We use 
Z 1 to denote the first column of Z, which has probability mass function 

P m , n (x\r,c,w)=F(Z 1 =x)= K,, n ^\r,c,w)t{z 1 =x} (xe{0,l} mxl ), 

zef2^,„(r,c) 

the support of which is a subset of 

fim,n(r, c) = {x G {0, l} mxl : x = z\ z G fi^Jr, c)}. 

If the entries of w are strictly positive, then the support is all of £l m ^ n (r, c). 

The algorithmic challenge of sampling the entire matrix Z reduces to the challenge of 
sampling from the first column Z 1 , because once we have the first column, then we can 
update the margins and proceed sequentially, treating successive columns like the first. 
Indeed, it is straightforward to verify that 

P(Z' = x | Z 1 ^ 1 = y) = P m ,n-j+i{x | r - R(y) , c? :n , w^) , 

so that the generic decomposition P(Z) = P(Z 1 )H^ =2 F(Z j | Z 1 '^ 1 ) gives 

n 

P* >n (z | r,c,w) = P m , n (z 1 I r,c,w)l[P mtn - j+1 (z> r Haz^ { ).<^'.w^'). 

i=2 

(In the main text, we primarily focus on sampling the first column Z\, we suppress 
m, n, r, c, w in the notation as much as possible, and we assume that k > 0.) To summarize, 
our target distribution is the binary random vector Z\ G {0, l} mxl with probability mass 
function 

m n 

P(x) oc ^{R{z) = r, C(z) = c,z 1 = x}HH w% . 

ze{o,i} mxn i=ij=i 

B Preprocessing the weights and margins 

The preprocessing that affects the definition of Q* consists of transforming w into w and 
choosing an ordering of the columns. Other elements of the preprocessing are merely for 
computational efficiency. All of the preprocessing of w, but not reordering the columns, 
can be skipped when it is known that P* is uniform over f2*, e.g., when w = 1. 

B.l Computing w, the solution to equation (16) in the main text 

Fix w G [0,oo) mx ™. Define 

n m 

Hi = ^2l{wij > 0}, rnj = ^2l{wij > 0} (i = l,...,m; j = l,...,n). 

j=l i=l 

We are looking for the w G [0, oo) mxn with the following properties: 

n m 

Wij = ai(3jWij, ^ Wij = Hi, ^ Wij = nij (ai,(3j > 0; i = 1, . . . , m; j = 1,... ,n). 
j=i i=i 
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Initializing k/ ** = w and t = 1, we iterate the following fixed point equations until conver- 
gence: 



-(2t-l) 



(2t-2) 
ij 



En - 
1=1 W > 



(i = l,...,m; j = l,...,n) 



(2t) _ "Wj 



Em - 



(2t-2) 
(2t-l) 

(2t-T) ( i = i>- ••>"■*; i = 1 , ■ ■ ■>")> 



where superscripts are indices, and where we take 0/0 = 0. If we iterate this for T steps, then 
we use w = w^ 2T \ In our experience, a small T is usually adequate to reach convergence. 
Since each jun_ £ Mw), iterating to convergence is not important for validity of the 
algorithm. Rothblum Ik Schneider (1989) prove existence and uniqueness of w. They also 
show that the solution can also be found using a convex programming algorithm, but we 
have not experimented with this approach. 

The computational cost of computing w takes at least 0{mn) operations, but we do 
not have theoretical bounds on the computational complexity. In our experience, it can 
be treated as a negligible preprocessing step. Although this choice of w outperforms 
many alternatives, we have found n o theoreti c al justificat ion for its use. It is closely 



related to Sinkhorn balancing of w (jSinkhornl . 11964 . 119671 ). which has app eared i n the 
literature in both algori thmic and theoretical treatments of permanents (e.g., Ando . 19891 ; 
Beichl fc Sullivanl . ll999h . and it has the nice property that w = 1 whenever w = a.0 1 . In 
any case, -P mn (- | r,c,w) = P^ni' I r i c i w )i so switching from w to w does not change 
the target distribution. 



B.2 Choosing a column ordering 

Our algorithm is not invariant to the ordering of the columns, nor to the pattern of zeros in 
w. We use the following heuristic ordering of the columns. First, if w is banded, then we 
leave the columns in their original order. The special case of banded weights arises frequently 
in some applications and we find that the banded ordering works best for accommodating so 
many zero weights. In other cases, we reorder the columns first by decreasing column sum 
and then by decreasing variance of the entries of w within each column. These preprocessing 
steps, and the accompanying postprocessing steps of returning the columns to their original 
orders, all require negligible additional computation. In practice, if one is interested in a 
specific matrix for which these heuristics do not work well, then it can often be advantageous 
to experiment with different column orders or perhaps swapping the roles of rows and 
columns. For the description of the algorithm, when referring to the jth column, we mean 
the jth column after any reordering of the columns. 



B.3 Precomputing the constants v for all columns 

Define the symmetric polynomials 

n 

G n (y,k)= HT,"=il>j = k}l[Vj i (yeR n ,k = 0,...,n), (18) 

6e{0,l}" j=i 
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and let w{' k = (wij, . . . , Wik) denote the ith row of w 3:k . Before sampling we also precom- 
pute and store 

G(i,j,k) =G n ^ +1 (wf n ,k) (19) 

for all i = 1, . . . , m, j = 1, . . . , re, and k = 0, . . . , min(rj, n — j + 1). The entire collection 
can be computed in 0(nd) operations (where d = ^i r « = by initializing with 

G(i,j,0) = 1, G(i,n, 1) = u>j n , and G(i,j,k) = for all i,j and k > n — j + 1, and then 
using the recursive formula 

G(i,j, k) = G(i,j + l,k)+ WijG(i,j + l,k— 1). 

In particular, in equation (15) in the main text we see that for the first column 

( n - 1 )" 1 G(t J 2,r < ) 

(Recall that we use w, not w, in our implementation of the algorithm.) For the j'th column 
(1 < j < re) we will have 

"•>,(, /^'l: 7 :^ ;) Wj + U- ^(^^ 1 ) = 1) 

where z 3 are the previously sampled columns so that r — R(z 1:3 ) are the updated row 
sums when preparing to sample the j'th column. 



C Alternative combinatorial approximations 

For each positive integer I and any nonnegative integer a we define 

[a] e = a(a- 1) ••■ (a-£+ 1), 
and for a /c-vector t of nonnegative integers we define 



[t]t = X>k- 



8=1 



In Section 4.2 of the main text we used a combinatorial approximation due to lCanfield et al. 



(|2008l ). 'lowever, other approxim ations can also be used and may give better performance 
for some problems. For instance, Greenhill et al.l ( 2006 . Theorem 1.3) provide an alterna- 
tive combinatorial approximation for N m ^ n {r,c) that is accurate, asymptotically, for sparse 
matrices, except perhaps when the margins are extremely variable: 



N mn (r,c) 



imimnsuc,-! 



exp(-ai(c)[r] 2 - a 2 (c)[r] 3 - a 3 (c)[r]l). 



ai(c) 



Mi 

2[c\l ' 2[c]f ' 4[c]f 



[c] 2 



4 ' 



«2(c) = - 



Ma , [c]I „, , , _ [c] 2 , [c] 3 



+ 



3[c]f 2[c]f 



i ' 



M 

4[ctf ' 2[c]f 2[c]f 



+ 
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where we take 0/0 = 0. Following Section 4.2 in the main text, a straightforward calculation 
gives 



Ui 



; exp[(r; - l){2 ai (c 2 - n ) + 3a 2 (c 2 -- n )(r t - 2) + 4a 3 (c 2: ")([r] 2 - n + 1))]. 



This combinatorial app roximation i s an im provement of an approximatio n in IO'Neill(|l969h . 
which was mentioned in IChen et alJ (|2005h and st udie d bvlBlanchetl (j2009h . Both are exactly 
uniform for the pathological cases in Bezakova et al. ( 20061 ) : see Supplementary Section lR5l 



D Structural zeros and ones 
D.l Remarks 

The algorithm can be improved to better accommodate structural zeros. We avoided this in 
the main text to simplify the exposition, but the complexity of the algorithm does not change 
significantly. The numerical experiments in the main text do not use these improvements, 
even though some of the examples have structural zeros. 

We use the term structural zeros to denote positions such that u>ij = 0, which 

allows the investigator to explicitly force the binary matrix to zeros at those positions. It 
is possible that the row and column sums also force some entries to be zero, but we are not 
referring to those types of implicit structural zeros. 

Sometimes it is desirable to force an entry to be one. These structural ones can be 
accommodated using structural zeros. In a preprocessing step, we replace structural ones 
with structural zeros and decrement the row and column sums appropriately. Then we 
sample as usual. In a postprocessing step, we reinsert the structural ones. Henceforth, we 
only discuss structural zeros. 



D.2 Extensions to Theorem 2 in the main text for zero diagonal 

Here we report an extension of Theorem 2 in the main text to the case where w has at most 
one zero entry in each row and column. This includes the special case of a zero diagonal, 
which arises frequently when the binary matrices of interest are adjacency matrices of 
directed graphs. Unlike the main text, the order of the columns is important for the 
validity of the algorithm. The columns must be reordered during preprocessing so that 
ci > • • • > c n . 

Theorem 3. \CheA . \200d . \200i ) Assume that a > ■ ■ ■ > c n , fix w G [0,oo) mxn ; define 
aij = t{wij > 0} for each i,j, assume Ri(a) > n — 1 and Cj(a) > m — 1 for each i,j, and 
assume that Km n (r,c,w) > 0. Choose it so that r ni > ■ ■ ■ > r Wm and so that whenever 
r-Ki = r ni+1 we also have y ni < y ni+1 , where 



I the unique j such that 

Vl = \ j_ 1 
[n + 1 

For each i = 1, . . . , m, define 

'{0} (0^1^=0); 
A = < {0, 1} (0 < a ni xr ni < Rn t ( a ))'i 



if there exists such a j; 
otherwise. 



Bi 



{max(0,6j), . . . ,ci} (i < m); 
{ci} (i = m), 
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for 



(ELx' 



7Tf , 



mm 



j=l,-,n{^2k=j+l °k + Yle=l J2k=2 a ^ik}- 



Define Vt according to (5) in the main text. Then £1 is the support of P m ,n{' \ r,c,w). 



D.3 Alternative treatments of structural zeros 

Here we redefine .A, U, and V from the main text to account for structural zeros differently. 
We define A. according to Supplementary Theorem 3 above, which allows trivial cases to 
be handled by VL. 

Let Y be a random matrix chosen uniformly over the support of P* and define 



U(x) = ¥(Y 1 = x) 



N m , n -i{r - x,c*™) 
iY m ,„(r,c) 



V(x)=eC[[ 



w ij 



x 



so that, for any x, P{x) oc U(x)V(x)l{x G il}. In the main text, these definition were the 
same except that Y was chosen uniformly over f2*. If w forces structural zeros, then the 
support of P* may be smaller than £1*. We proceed exactly as in the main text to develop 
approximations of the new U and V. 

For the new U, we follow section 4.2 and note that N m ^ n (r,c) needs to be replaced 
by the size of the support of P^ n (- \ r,c,w), say N mtTl (r,c,w), and, consequently, N 
needs to be replace d by a combinatorial approximation of the size of the support of P* . 
Greenhill & McKavi (120091 ) provide the modified asymptotic enumeration results corre- 
sponding to those that led to equation (13) in the main text. Define etjj = 1{w{j > 0} 
for all Note that N m „ (r, c, w) = N mjn (r, c, a). For an approximation of iV mjn (r, c, w), 
Greenhill & McKay! hOiV.i Theorem 2.1) suggest 



-1 m 



Ri(a] 



n 



CAa] 



N m , n (r,c,a) =(^^«=i— ) 11 

i=l 

x exp[--(l - Hm, n (r,c)) (1 - i> m , n (c)) - S min (r,c, a)J , 
S min (r, c, a) = w (c) jr J2 ( (1 - a u) ( n ~ S Cfc ) { c i ~ it Ck 



i=i j=i 



fc=i 



fe=i 



which reduces to the formula in the main text when a = 1. The functions fJ>,v,r) are defined 
in the main text. This approximation leads to 



^(a 2: «) -n + 1 



exp r? m , n _i(c 



2:n\ 



(1 



/',„., t _i(c 2:n )) ( - -ri + — V^Cfc 



k=2 



+ Y(l-a „)(c 3 ---^ , 
^ v J m(n-l) 



J'=2 



1) Z - 



a 



fe=2 



where, as before, we set Ui = 1 whenever r*j = or = i?,(a 2:n ) + 1. 

For the new V", we follow section 4.3 in the main text, but define B to be a matrix of 
independent Bernoulli random variables where Bij is Bernoulli^./ /2) for a^- = l{wij > 0}. 
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Following equations (14) and (15) from the main text, the first change comes after the 
second equality in (15), giving 

(^ ( r ) )~ 1 E6 e{ o ) i } »- 1 l{E^ 1 1 6,=r i }n^ 2 ^- 1 
_ wniRija 2 ^) -n + l)Gn-x{^\n - 1) = w^R^a 2 ^) - n + l)G{i, 2, n - 1) 
r f G^i(to? ! Vi) riG(i,2,n) 

where we have made use of the fact that terms inside the summations in the first expression 
are zero whenever b has an entry of one in a place where there is a structural zero. The 
functions G n and G are defined above in supplementary section IB. 31 For zeros in the numer- 
ator or denominator of the expression for we set i>j = 1 and allow Ai to deterministically 
choose the appropriate value of the ith entry. As in the main text, we suggest replacing w 
with w throughout. 



E Numerical illustrations 

E.l Pseudorandom number generator 

The pseudorandom number generator used by the importance sampling algorithm for the 
numerical illustrations is the default pseudorandom number ge nerator in Matlab version 



7.14, which is the Mersenne twister algorithm mtl9937ar (c.f. iMatsumoto Nishimura . 
19981 ) described at 



http : //www. math, sci .hiroshima-u. ac . jp/~m-mat/MT/ emt .html . 



E.2 Canonical weight matrices 

The weight matrices w used in Section 6 of the main text are built from a canonical matrix 
y. The m x n canonical matrix y is constructed as follows: 

R{{j - l)m + i) 
Vi i = 2 3i _ 7 = 1, . . . ,m; j = l,...,n), 

where R{0) = 1 and 

R{k) = 7 5 R{k - 1) mod (2 31 - 1) (k = l,...,mn), 

The sequence R(l) , R(2) , . . . is a simple, well-known multiplicative congruential pseudo- 
random number generator, known as MINSTD, for the discrete uniform distribution over 
{1, . . . , 2 31 — 2} ( Park &: Miller . 1988). It was the default pseudorandom number generator 



in Matlab for many years and is fine for our purpose of creating a matrix y with independent 
uniform(0, 1) entries whose values are easy to communicate to others. 

E.3 The number of n x n two-regular binary matrices 



Anand et al.l (|1966l . Eq. (27)) give a simple recursive formula for the number of n x n 



two-regular binary matrices, say H n . Initialize Hi = 0, H2 = 1, H% = 6, and then 
H k = \k(k - if {{2k - 3)F fc _ 2 + {k- 2fH k ^) {k = 4, 5, . . . ). 
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The exact value of -H500 can be found in the appendix of this supplement. As noted in 
Se ction 6 of th e main text, our algorithm provides an extremely accurate approximation. 
Chen et al.l J2OO5I) used their importance sampling algorithm to approximate -f/ioo as 



(2.96 ± 0.03) xlO 314 based on a sample size of 100. For comparison, using a sample of size 
100 from our algorithm gives an approximation of (2.969 ± 0.001) x 10 314 , which appears to 
be almost 1000 times more efficient for the purposes of approximate enumeration. The true 
value is 2.9692 . . . x 10 314 . The full number can be found in the appendix of this supplement. 
We should also note that the importance sampling approximations are much more accurate 
than the combinatorial approximations upon which the importance sampling algorithm is 
based. For instance, using the approximation N from Section 4.2 of the main text gives 
2.957xl0 314 . 

E.4 Approximating a-permanents 

Here we report comparisons between using our algorithm f or approximating q - perm anents 
and using the custom importance sampling algorithm of iKou fc McCullagh (|2009h . We 



thank Sam Kou for sharing his code with us. The a-permanent of w can be expressed as 

per Q (w) = KE(a cyc(z) ), 

where Z has distribution P* with the same w and all row and column sums equal to 
one; see equation (3) in the main text. We approximate it using the consistent, unbiased 
approximation 



I 1 

per QiT (u>) = k T fi T = 7f;Yl f( Z t) h ( Z 



T 

t=l 

for h(z) = a cyc ( z ); see Section 5.2 and equation (17) in the main text. 

The Kou & McCullagh algorithm does not attempt to generate Z from a distribution 
that is close to P*, like ours does, but rather from a distribution proportional to h(z)P*(z). 
In the case where a = 1 so that h = 1, the two approaches agree and the empirical results 
are quite similar. But when a^l, their algorithm is generally better, because is it tailored 
for the choice of a. Nevertheless, our algorithm might be useful in cases where per a (w) is 
needed for many a simultaneously, or in cases where a is very close to one. 

Supplementary Table [3] reports pei a j>(w) along with approximate standard errors de- 
fined as <jt/VT, where 



1 T 
t=\ 



It also reports an approximate relative standard error defined as 

re l T = f'" x 100%. 
ktUt 

We use T = 1000 for the e xamples with n = 500 t o match Table 1 in the main text. 
The other w are taken from IKou fc McCullagh! (|2009h and we use T = 20000 to facilitate 



comparison with their results. In some cases the true value of per a (w) is known and this 
is shown in the final column of the table; see the supplementary appendix. Except for the 
n = 500 examples and the resul ts from our algorithm, th e entries of Supplementary Table 
[3] come directly from Table 1 in lKou fc McCullaehl Jiooi). 
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Table 3: Approximating a-permanents 



parameters 


our al 


gorithm 




Kou & McCullagh 




true value 


w 


n 


a 


per Q , T (™) 


rel T % 


per Q T {w) relr% 


per Q (u;) 


I 


500 


1 


(1.220 ± 0.000) 


xlO 1134 


0.00 


(1.220 ±0.000) 


xlO 1134 


0.00 


1.220xl0 llc 


II 


500 


1 


(1.437 ± 0.001' 


xlO 1222 


0.08 


(1.441 ± 0.001) 


xlO 1222 


0.08 


? 


III 


500 


1 


(3.998 ± 0.028) 


xlO 983 


0.69 


(3.975 ± 0.033) 


xlO 983 


0.82 


? 


IV 


500 


1 


(3.523 ± 0.056) 


xlO 1133 


1.60 


(3.546 ±0.066) 


xlO 1133 


1.85 


? 


I 


500 


1/2 


(2.963 ±0.167' 


xlO 1132 


5.62 


(3.078 ±0.000) 


xlO 1132 


0.00 


3.078xl0 llc 


II 


500 


1/2 


(3.296 ±0.174) 


xlO 1220 


5.28 


(3.662 ± 0.012) 


xlO 1220 


0.32 


? 


III 


500 


1/2 


(8.889 ± 0.459) 


xlO 981 


5.16 


(1.021 ± 0.012) 


xlO 982 


1.21 


? 


IV 


500 


1/2 


(9.759 ± 0.650 : 


xlO 1131 


6.66 


(9.228 ± 0.228) 


xlO 1131 


2.47 


? 


Ai 


20 


1 


(9.800 ±0.008) 


xlO 32 


0.08 


(9.787 ± 0.014) 


xlO 32 


0.14 


9.784xl0 32 


A 2 


20 


1 


(3.513 ±0.004) 


xlO 32 


0.10 


(3.506 ±0.007) 


xlO 32 


0.21 


3.514xl0 32 


A 3 


15 


1/2 


(1.456 ±0.009) 


xlO 22 


0.60 


(1.437 ±0.003) 


xlO 22 


0.22 


1.439xl0 22 


Ai 


15 


1/2 


(7.049 ± 0.044) 


xlO 21 


0.63 


(7.043 ±0.022) 


xlO 21 


0.32 


7.034xl0 21 


A 5 


20 


1 


(3.290 ±0.003) 


xlO 49 


0.09 


(3.294 ±0.012) 


xlO 49 


0.38 


3.290xl0 49 


A 6 


20 


1 


(5.928 ± 0.024) 


xlO 40 


0.40 


(5.782 ±0.103) 


xlO 40 


1.72 


5.946xl0 40 


A 7 


15 


1/2 


(2.069 ±0.013) 


xlO 31 


0.63 


(2.092 ± 0.008) 


xlO 31 


0.37 


2.095 xlO 31 


As 


15 


1/2 


(1.579 ±0.020) 


xlO 25 


1.27 


(1.549 ±0.027) 


xlO 25 


1.68 


1.579xl0 25 


K(x) 9 


9 


1/2 


(4.524 ±0.036) 


xlO 


0.79 


(4.504 ±0.020) 


xlO 


0.43 


4.505x10° 


K(x)u 


11 


1/2 


(1.634 ±0.014) 


xlO 2 


0.86 


(1.622 ±0.009) 


xlO 2 


0.56 


1.623xl0 2 


K(x) 13 


13 


1/2 


(5.815 ±0.050) 


xlO 3 


0.86 


(5.844 ±0.026) 


xlO 3 


0.45 


5.816xl0 3 


K(x) 15 


15 


1/2 


(2.134 ±0.019) 


xlO 5 


0.89 


(2.117±0.011) 


xlO 5 


0.53 


2.114xl0 5 




100 


1/2 


(1.876 ±0.118) 


xl0~ 16 


6.28 


(1.928 ±0.037) 


xl0~ 16 


1.90 


1.911X10- 1 



E.5 Additional numerical illustrations for the uniform distribution 

Our original interest in these problems was motivated by the uniform distribution over 0* 
and we have a variety of simulations investigating this special case. This section is largely 
reproduced from one of our 2009 unpublished preprints, arXiv : 0906 . 1004vl, which focused 
on comparing different c ombinatorial approximations and was the basis for our emphasis on 
the Canfield et al. ( 20081 ) approximation in the main text. The simulations from this section 



were carried out in 2009 on a MacBook laptop with 2 GB of RAM and a 2.16 GHz dual 
core processor using Matlab. Everything in this section refers to the uniform distribution 
with w = 1. 

Supplementary Table 2] details the speed of the algorithm on 1000 x 1000 binary matrices 
with all row and column sums identical. These run-times are merely meant to provide a 
feel for how the algorithm behaves — no attempt was made to control the other processes 
operating simultaneously on the laptop. Presumably a careful C or assembly language 
implementation would run much faster. The observed runtime scales closely with the com- 
putational complexity of 0(md). So, for example, 100 x 100 n-regular matrices can be 
sampled about 100 times faster than 1000 x 1000 ri-regular matrices, and 10 x 10 matrices 
can be sampled about 10000 times faster. 



Table 4: Sampling time per 1000 x 1000 ri-regular matrix 
n 2 4 8 16 32 64 128 256 512 
time (s) 1.2 1.6 2.4 4.0 6.7 12.4 24.4 39.2 46.6 

Supplementary Table [5] reports diagnostics on these examples using T = 1000. We 
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note that the true number of 1000 x 1000 two-regular matrices is 1.75147- • • xlO 5133 ; see 
Supplementary Section TE.31 The approximation from the first row of Supplementary Table 
[5] is quite accurate. 

Table 5: Performance for the uniform distribution over 1000 x 1000 ri-regular binary ma- 
trices 



n 


A T 


CV T 








2 


0.049 


4.2x10- 


-6 


(1.75148 ±0.00011) x 


-LQ5133 


4 


0.075 


6.4x10- 


-6 


(7.64296 ±0.00061) x 


1 Q9910 


8 


0.041 


2.1x10" 


-6 


(1.01879 ±0.00005) x 


1 Q18531 


16 


0.008 


3.9x10- 


-7 


(2.31580 ±0.00005) x 


1 Q33629 


32 


0.005 


2.3x10- 


-7 


(6.50167 ±0.00010) x 


-^q59218 


G4 


0.004 


2.2x10- 


-7 


(1.22048 ±0.00002) x 


1Q100716 


128 


0.004 


1.8x10- 


-7 


(9.38861 ±0.00013) x 


1 Q163302 


256 


0.004 


2.2x10- 


-7 


(6.70630 ±0.00010) x 


1 Q243964 


512 


0.004 


2.2x10- 


-7 


(5.02208 ± 0.00 7) x 10 297711 



Bezakova et al. ( 20061 ) investigates the performance of the Chen et al. ( 20051 ) algorithm 



on pathological margins with ver y large r-\ and ci , b ut with all other row and column sums 



exactly 1. They prove that the I Chen et al.l (|2005h proposal distribution is extremely far 



from uniform for such margins, too far for importance sampling to be practical. It seems 
likely that our Q* suffers from the same problem, because of the similarities between the 
combinatorial approximations in each approach. The empirical performance of the Q* from 
the main text is quite bad in these cases; see below. On the other hand, it is straightforward 
to show that using the combinatorial approximations in Supplementary Section \C\ gives 
Q* = P* for these types of m a rgins . 

Following Bezakova et al. (j2006l ). we experiment with the margins r T = (240, !,...,!) 



and c = (179, 1, . . . , 1) for a 240 x 301 matrix. By conditioning on the entry in the first row 
and the first column and then using symmetry, one can see that 

• 1 ' 1 V 240 /V 179 / V 239 / V 178 / 

Generating a single observation takes about 0.077 s. Using T = 10 5 gives At = 4.1xl0 n , 
cv T = 1.7xl0 3 , and kt = (2.2 ± 0.3) xlO 205 , which is quite bad and highly misleading: 
approximate 95% confidence intervals created by doubling the standard errors would not 
come close to covering the true value of k. Alternatively, using the algorithm with U{ from 
Supplementary Section ICl gives At = cv 2 = and kt = k = 9.6843- •• xlO 205 , since 
Q* = P* in this case. In most practical examples, however, the algorithm presented in the 
main text is superior. 

Finally, consider Darwin's finch data (c.f. Chen et all 20051 ) which is a 13 x 17 occurrence 



matrix with r T = (14, 13, 14, 10, 12, 2, 10, 1, 10, 11, 6, 2, 17) and c = (4, 4, 11, 10, 10, 8, 9, 10, 8, 9, 
3, 10, 4, 7, 9, 3, 3). A single sample takes about 0.0 01 s. With T = 10 6 , we find A T = 2.8 x 10 3 
and cv| = 0.44 with k T = (6.722 ± 0.004) x 10 16 . IChen et al.1 (|2005l ) report the true value of 



k = 67149106137567626, and they also report a cv r of "around one" for their algorithm 
on this problem. Generally speaking, these importance sampling algorithms tend to be less 
uniform for small irregular problems like this one, than for the larger and/or more regular 
examples above. 



23 



The previous experiments are based primarily on the internal diagn ostics of s a mples from 



the proposal distribution Q*. Other than the asymptotic analysis in Blanchct (2009) con - 



cerning approximate enumeration using a variation of the algorithm of Chen et al. ( 20051 ). 
there are no external checks on the uniformity of Q*. Using a complicated, high dimen- 
sional proposal distribution without external checks can be dangerous. Indeed, consider the 
following worst-case scenario. Suppose that £1* = E U E c , where E is much smaller than 
E c , and suppose that Q* is uniform over each of E and E c , but far from uniform over Q*, 
namely, 

Q*(z) = G E} + j^l{z G E c } (\E\/\E C \ « e « 1). 

If e is extremely tiny, say e = 10~ 100 , then Monte Carlo samples from Q* will, practically 
speaking, always lie in E, which itself is a tiny fraction of Q*. Furthermore, all internal 
diagnostics will report that Q* is exactly uniform, since it is uniform over E. But, of 
course, statistical inferences based on samples from Q* will tend to be completely wrong. 
This section describes two types of experiments designed to provide external checks on the 
uniformity of Q*. 

For the first set of experiments we generate a binary matrix Z from the uniform dis- 
tribution over all binary matrices with row sums r. This is easy to do by independently 
and uniformly choosing each row of Z from one of the (") possible configurations. Since 
the conditional distribution of Z given its columns sums C is uniform over Q^ n (r,C), 
we can view Z as a single observation from the uniform distribution over n (r, C). Of 
course, there is no practical way to uniformly and independently generate another such 
Z with the same C. Notice that the importance weight f{Z) gives external information 
about the uniformity of Q* for these margins, since it gives the value of 1/Q* at a uniformly 
chosen location in Vt* mn (r,C). Indeed, in the pathological thought experiment described 
above, Z would almost certainly be in E c and f(Z) would be substantially larger than 
any of the importance weights. Alternatively, if Q* is nearly uniform, then f(Z) should be 
indistinguishable from the other importance weights. In summary, we can compare Q* to 
P* by comparing the importance weights to f{Z). This observation can also be used to 
give valid Monte Carlo p- values with importanc e sampling, even if the importance sampling 
distribution is far from the target distribution ( Harrison . 20121 ). 



Each experiment of this type proceeds identically. We fix m, n, and r. Then we 
generate L iid observations, say Z^ , . . . , Z^ , from the uniform distribution over all m x n 

binary matrices with row sums r. The column sums of these matrices are 

d) (i) 

Then, for each £ = 1, . . . , L, we generate T iid observations, say Z\ , . . . , Z T , from the 
proposal distribution Q* over Q,^ n (r, C^). We compute the ratio of maximum to minimum 
importance weights including the original observation for each £, namely, 

^(0 _ niax t= o i ,„ iT /(zf ) ) _ 1 



and we report the final summary A™ ax = max^ = i ... £ A~/ . If A™ ax is close to zero, then this 
provides evidence that Q* is approximately uniform over a large part of each QJ^ n (r, C"). 

We begin with 1000 x 1000 matrices with regular row sums r\ = ■ ■ ■ = r m , but the 
column sums will not be regular, since they are generated randomly. We use L = 10 and 
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T = 10 for the cases n = 2,8,32, finding A^ ax = 0.0002,0.0023,0.0051, respectively. For 
another example, take the row sums for the irregular 50 x 100 case that was used for Table 
2 in the main text and take k = 1, i.e., r = r. We use L = 100 and T = 1000 and find 
that A™ ax = 1.264. These preliminary experiments are encouraging, and suggest that Q* 
is indeed a good approximation of uniform P* in many cases. 

For the second type of experiment, we try to design an extreme z' € f2* and compare 
the importance weights to f(z'). Again, if Q* is approximately uniform over all of f2* then 
f(z') should be indistinguishable from the other importance weights. For these experiments 
we report 

x, _ max{f(z'),f(Z 1 ),...,f(Z T )} 
T min{/(z'),/(Z 1 ),...,/(Z T )} ' 

which should be close to zero if the region in £1* where Q* is approximately uniform includes 
z'. 

Consider the regular case where m = n = 1000 and r\ = r{ = Cj for all Suppose 
that r\ evenly divides 1000 and let z' be comprised only of disjoint r% x r% blocks of ones. In 
particular, take z'^ = 1 for (k—l)r\ + \ < i,j < kr\ and for k = 1, . . . , 1000/ri. For the cases 
r\ = 2,4,8 we compute f(z') and compare it to the data that generated the corresponding 
parts of table El obtaining A^ = 0.741, 26.24, 6. 25x 10 4 , respectively. Clearly, Q* is not 
a uniformly accurate approximation of P* over all of il* and is unlikely to be useful as a 
proposal distribution for rejection sampling to get exact samples from P*. Nevertheless, Q* 
seems to be extremely well-suited as a proposal distribution for importance sampling. For 
another example, consider the irregular 50 x 100 case that was used for Table 2 in the main 
text and take k = 1, i.e., r = r and c = c. We construct a pathological z' as follows. Place 
ci ones in the last c\ rows, corresponding to the smallest row sums, of the first column. 
Place C2 ones in the last available C2 rows of the second column, where a row is available 
if placing a one in that row will not exceed the row sum for that row. Continue in this 
manner until all the columns are assigned or until a column cannot be assigned successfully. 
In general, this procedure is not guaranteed to terminate successfully, but it does for this 
choice of margins. The resulting z' is unusual because rows and columns with large sums 
tend to have zeros at the intersecting entry. Using the data from the corresponding part of 
Table 2 in the main text gives A^ = 14.37. 

Supplementary Appendix 

The number of 100 x 100 two-regular binary matrices 

2969298425 4879211020 5463258948 9046531125 6932010720 0899043082 6661472985 5602957737 
5386603250 7914169840 3947972542 0803105057 9494091210 8196163985 3132939771 8223074880 
1582489734 4113002630 0345104451 5505567811 8301236764 6670284335 5753266570 2919415207 
2361422613 1731302283 4023510256 2089359423 4174989926 4000000000 0000000000 00000 
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The number of 500 x 500 two-regular binary matrices 



2276586004 3872645654 7163822917 6140246378 
7024859918 5873168947 8428993358 7710991052 
7751901014 9459428637 5300752209 6236400145 
8803085437 1426603063 9060350752 5676829379 
5382285704 7911170936 9213162976 1124369611 
3783877264 5366936440 0850289755 0247749665 
4052084791 8351006525 6516882276 6819426986 
5313886817 9879619523 6757312609 9563935644 
3607836217 2202522127 9381851238 5339132917 
4200343275 4426328049 4429348983 4734923700 
1420144526 3238392163 0625410465 1147246306 
4878729606 4682614593 4828351763 2549610945 
0511243887 9654470644 8952801709 4100687806 
9226192052 8186323173 8128828855 7307283447 
1430690842 9240854111 3288457886 5068062328 
0105313331 5631006370 7547904555 8541951209 
5271576323 7806458898 5197173413 2333790169 
0789892314 7885014128 9831206980 1470933069 
0084914243 1261925800 3966112818 5515090740 
6682749949 6649629337 6729065663 7740220752 
9944691129 8101632276 5735759559 3131678694 
5086650691 0193318778 3650834389 5788540935 
2767810044 6245991329 5809908199 9005968612 
0446650398 8183375905 5124117054 7434261832 
9467531245 7165626500 2517876951 1088955612 
8206541306 8546637026 9648941941 8803223917 
0828596942 2197021633 2700563010 0820849326 
0000000000 0000000000 0000000000 0000000000 
0000000000 0000000000 0000000 



6529219189 6007058852 1885701633 9308224336 
6831024823 1020957186 1359882527 3634597638 
2272455600 2450498447 6886449802 2657577100 
2441654640 4384402364 9178512515 5701834312 
0263144354 2492660647 6317501009 4702298551 
4582496735 4778933695 9359401807 4728987947 
4276522770 8754690714 8153703130 7689579335 
5860973860 5720751902 8525628015 1655464790 
8663772909 4697618230 9562268584 1389355037 
0635594018 1200043308 9996436581 2082429967 
0267066287 2838455102 1984436331 4795820153 
2823414530 6966187549 3636469942 1582542169 
1803581920 0502635810 6084543151 8196763100 
5486503911 0996089630 7969624574 8668199425 
1130147009 2410850737 0194640624 5215023611 
3762970404 4299114208 6898539174 1261578007 
8450503603 6175432120 4646913329 9283772618 
2885920165 3886059912 3547627990 2473766270 
2869173796 5265773700 6653705150 0776999823 
0069908832 1026134189 8109544591 4141299020 
4342947732 7389063830 1146871076 6098180223 
3233656140 3425148468 8948999361 5539721393 
6446584189 0334076925 7082772956 3377889631 
8900372657 5745038153 2952534928 4112016395 
4288697963 9375087520 6487400471 4382991165 
8589969888 6361729999 1147924387 2385375087 
1167561772 1388697124 8640000000 0000000000 
0000000000 0000000000 0000000000 0000000000 



Exactly computing the a-permanent of a constant matrix 

If 7r = (tti, . . . , 7r„) is a permutation chosen uniformly at random and C is the number of disjoint 
cycles in 7r, t hen C has the same distribution as B\ + ■ ■ ■ + B n , where each Bi is independent 
Bernoulli(l/i) (|Durrettl l201ol Lemma 2.2.5). If w is an n x n constant matrix with common entry 
b, then 



per Q (w) = n\b n E{a c ) = n\b n ]J E(a Bl ) = n\b n ]J ( - + (l - - 



1. 



i + a — 1 



We used this formula with n = 500 and b = 1 to get the true value of per Q (u;) for w in class I in 
Supplementary Table [3J 



Matlab implementation of the algorithm 

This is a place-holder for cleaner, shorter code that will be inserted prior to publication. Software 
will also be available on the author's website. 



function ElogQ,logP,alist] = BernoulliMarginsRnd(SampN,rN , cN , wN.pf lag,uf lag, cf lag ,bIN) 
'/.function [logQ , logP , alist] = BernoulliMarginsRndCN.r , c , w,pf lag, wf lag, cf lag.Binput) 
V. 

V, Approximate sampling from independent Bernoulli random variables BCi,j) 
arranged as an m x n matrix B given the m-vector of row sums r and the 
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n-vector of column sums c, i.e., given that sum(B,2)=r and sum(B,l)=c. 

'/„ An error is generated if no binary matrix agrees with r and c. 

B(i,j) is Bernoulli (p(i ,j ) ) where p(i , j )=w(i , j ) / (l+w(i , j ) ) , i.e., 
'/( w(i , j )=p(i , j ) / C l - pCi , j ) ) ■ [The case p(i,j)=l must be handled by the user 
% in a preprocessing step, by converting to p(i,j)=0 and decrementing the 
'/„ row and column sums appropriately.] 

Use w=[] for w identically 1, i.e., approximate uniform sampling over 
'/„ binary matrices with margins r and c. 

N is the sample size. Because of pre-processing, it is more efficient 
per matrix to use larger sample sizes. 

alist stores the locations of the ones in the samples. 
'/, If d = sum(r) = sum(c) , then alist is 2 x d x N. 

'/„ The 1-entries of the kth matrix are stored as alist (:,: ,k) . The 
(row, column) indices are (alist (1 ,t ,k) , alist (2 , t , k) ) for t=l :d. 

°/, If B is the kth matrix, then B can be created from alist via: 

'/. B = false(m,n); for t = 1 : size(alist ,2) , B(alist (1 ,t ,k) , alist (2, t ,k) ) = true; end 

'/, logQ(k)=log(probability that algorithm generates B) 
'/, logP(k)=log(prod(w(B))) 

If the algorithm is used for importance sampling, then the kth 
'/„ unnormalized importance weight is exp(logP(k) -logQ(k) ) . 

% NOTE for w(i,j)=0: 

'/, If the entries of w are not strictly positive, then the algorithm can 

sometimes generate matrices with logP(k)=-inf . In these cases, some of 
'/„ the entries of alist (:,:,k) may be zero and logQ(k) corresponds to the 

probability of generating that particular alist (:,: ,k) . 

% OPTIONS : 

'/, pflag: 'canfield' or (default, works best in most cases) 
7, 'greenhill' (perhaps useful for sparse and highly irregular margins) 

pflag controls which combinatorial approximations are used 

wflag: 'sinkhorn' or '' (default) 

wflag controls the initial balancing of w; it is passed to canonical. m 

'/„ cflag: 'descend' or '' (default) 

°/ 'none ' (sample columns in original order) 

cflag controls the order in which the columns are sampled 

'/, Binput is a m x n binary matrix. If it is provided, then the algorithm 
computes the probability of generating this matrix. 

if nargin < 8 I I isempty(blN) 
doIN = false; 

else 

doIN = true; 

end 

if nargin < 7 I I isempty(cf lag) 
cflag = 'descend'; 

end 

if nargin < 6 I I isempty(wf lag) 
wflag = 'sinkhorn'; 

end 

if nargin < 5 I I isempty(pf lag) 
pflag = 'canfield'; 

end 

if nargin < 4 
wN = [] ; 

end 

doW = true; 

if isempty(wN), doW = false; end 
doA = true; 

if nargout < 2, doA = false; end 

if "isscalar (SampN) I I SampN < 1 I I SampN ~= round(SampN) , error('SampN must be a positive integer'), end 

ptype = 0; 
switch lower (pflag) 
case ' canfield' 
ptype - 1; 
case ' greenhill ' 

ptype = 2; 
otherwise 

error (' unknown pflag' ) 

end 
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x 

'/. START : PREPROCESSING 

x 

sizing 
raT = numel(rN) ; 
nT = numel(cN) ; 

'/„ sort the marginals (descending) 
rT - rN(:) ; 

[rsort , rndxT] = sort (rT , ' descend ' ) ; 

if doW 

X balance the weights 
[~ , ~ , wopt] = canonical (wN, wf lag) ; 
'/, reorder the columns 
switch lower (cf lag) 
case 'none' 

cndx = l:nT; 
case 'descend' 

[" , cndx] = sortrows C - [cN( : ) var (wopt ,0,1).']); 
otherwise 

error( 'unknown cf lag' ) 

end 

csort = cN(cndx) ; 
wopt = wopt ( :, cndx) ; 

precompute log weights 
logw = log(wN) ; 

x 

precompute G 

logwopt = log(wopt); 

rmax = max(rT) ; 
G - -inf (rmax+l,mT,nT-l) ; 
GCl,:,:) = 0; 

G(2,:,nT-l) = logwopt (: ,nT) ; 

for i = l:mT 
ri = rT(i); 
for j - nT-l:-l:2 

wij = logwopt(i, j) ; 
for k = 2:ri+l 

b = G(k-l,i,j)+wij 
a - G(k,i, j) ; 
if a > -inf I I b > 
if a > b 

G(k,i,j-1) 

else 

G(k,i,j-1) 

end 

end 

end 

end 

for j - l:nT-l 

for k = l:rmax 

Gknum - G(k,i, j) ; 
Gkden = G(k+l,i, j) ; 
if isinf (Gkden) 

G(k,i,j) - -1; 

else 

G(k,i, j) = wopt(i , j)*exp(Gknum-Gkden)* ( (nT-j-k+l)/k) ; 

end 

end 

if isinf (Gkden) 

G(rmax+l,i, j) - -1; 

end 

end 

end 

x 

else 

switch lower (cf lag) 
case 'none' 

cndx = l:numel(cN); 
case ' descend' 

[csort , cndx] = sort (cN( : ) , 'descend' ) ; 
otherwise 

error (' unknown cf lag' ) 

end 

end 

generate the inverse index for the row orders to facilitate fast 
'/, sorting during the updating 
irndxT = (l:mT).'; irndxT(rndxT) = irndxT; 

'/, basic input checking 

if rsort(l) > nT I I rsort (mT) < I I csort(l) > mT I I csort(nT) < I I any(rsort ~= round(rsort) ) I I any(csort ~= round(csort) ) 
error( 'marginal entries invalid' ) 



-inf 

= a + log(l+exp(b-a) ) ; 
- b + log(l+exp(a-b)) ; 
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end 



°/, compute the conjugate of c 
cconjT = conjugate. local (csort ,mT) ; 

get the running total of number of ones to assign 
countT = sum(rsort) ; 

get the running total of sum of c squared 
ccount2T = sumfcsort . ~2) ; 

get the running total of (2 times the) column marginals choose 2 
ccount2cT = sum (csort . * (csort- 1) ) ; 

get the running total of (6 times the) column marginals choose 3 
ccount3cT = sum (csort . * (csort- 1) . * (csort -2) ) ; 

get the running total of sum of r squared 
rcount2T = sumfrsort . ~2) ; 

'/, get the running total of (2 times the) row marginals choose 2 
rcount2cT = sum (r sort . * (rsort-1) ) ; 

get the running total of (6 times the) row marginals choose 3 
rcount3cT = sum (r sort . * (rsort-1) . * (r sort -2) ) ; 

check for compatible marginals 
if countT ~= sum(csort) I I any(cumsum(rsort) > cumsum(cconjT) ) , error ('marginal sums invalid') , end 

'/„ initialize the memory 
logQ = zeros (SampN, 1) ; 
logP = zeros (SampN, 1) ; 
if doA, AN = SampN; else AN = 1; end 
alist = zeros (2, countT, AN) ; 
initialize the memory 

M = csort(l)+3; % index 1 corresponds to -1; index 2 corresponds to 0, index 3 corresponds to 1 , index M corresponds to c(l)+l 

S - zeros(M,nT) ; 
SS - zeros(M,l) ; 

epsO = eps(0) ; % used to prevent divide by zero 

x % 

X END: PREPROCESSING X 

x % 

X loop over the number of samples 
for SampLoop = 1: SampN 



INITIALIZATION % 

if doA, ALoop = SampLoop; else ALoop = 1; end 

7, copy in initialization 

r = rT; 

rndx = rndxT; 

irndx = irndxT; 

cconj = cconjT; 
count = countT; 
ccount2 = ccount2T; 
ccount2c = ccount2cT; 
ccount3c = ccount3cT; 
rcount2 = rcount2T; 
rcount2c = rcount2cT; 
rcount3c = rcount3cT; 
m - mT; 
n = nT; 

initialize 

place = 0; % most recent assigned column in alist 
logq = 0; running log probability 
logp = 0; 

% % 

'/. START: COLUMN-WISE SAMPLING % 

X % 



'/, loop over columns 

for cl = l:nT 

X % 

% START: SAMPLE THE NEXT "COLUMN" % 

X % 

'/, remember the starting point for this columns 
placestart = place + 1; 

jr 

'/, sample a col 

jr 



label = cndx(cl) ; current column label 
colval = csort(cl); current column value 
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if colval == | | count == 0, break, end 

update the conjugate 
for i = 1: colval 

cconj (i) = cconj (i)-l; 

end 

update the number of columns remaining 
n = n - 1; 

'/„ DP initialization 

smin = colval; 
smax = colval; 
cumsums = count ; 

update the count 
count = count - colval; 

update running total of sum of c squared 
ccount2 = ccount2 - colval~2; 

update the remaining (two times the) sum of column sums choose 2 
ccount2c = ccount2c - colval* (colval-1) ; 

update the remaining (six times the) sum of column sums choose 3 
ccount3c = ccount3c - colval* (colval-1) * (colval-2) ; 

cumconj = count ; 

SSCcolval+3) = 0; 
SS(colval+2) = 1; 
SSCcolval+1) = 0; 

get the constants for computing the probabilities 
'/, it is faster to compute them all, than to check pflag 
d = ccount2c/count"2 ; 
if (count ==0) || (m*n == count) 
weight A = 0; 

else 

ueightA = m*n/(count*(m*n-count)) ; 

weightA = weightA* (1-weightA* (ccount2-count~2/n) ) /2 ; 

end 

d2 - ccount2c/(2*count"2+eps0) + ccount2c/(2*count~3+eps0) + ccount2c"2/(4*count"4+eps0) ; 
d3 = -ccount3c/(3*count"3+eps0) + ccount2c~2/(2*count~4+eps0) ; 

d22 = ccount2c/(4*count"4+eps0) + ccount3c/ (2*count~4+eps0) - ccount2c~2/(2*count~5+eps0) ; 

y t dynamic programming 

SSS - 0; 

'/, loop over (remaining and sorted descending) rows in reverse 
for i = m: -1 : 1 

7, get the value of this row and use it to compute the 

probability of a 1 for this row/column pair 
rlabel = rndx(i) ; 
val = r (rlabel) ; 
if ptype == 1 
% canfield 

p = val*exp(weightA* (1-2* (val-count/m) ) ) ; 
p = p./(n+l-val+p) ; 
q = 1-p; 
elseif ptype == 2 
% greenhill 

q = 1/ ( l+val*exp ( (2*d2+3*d3* (val-2) +4*d22* (rcount2c-val+l) ) * (val- 1) ) ) ; 

p = !-q; 

else 

% never get here 

p = 0; q = 0; '/, helps compiler 

end 

'/, incorporate weights 

if doW Ml n > && val > 

Gk - G(val, rlabel, cl) ; 

if Gk < 
q - 0; 

else 

p = p*Gk; 

end 

end 

V, update the feasibility constraints 
cumsums = cumsums - val ; 
cumconj = cumconj - cconj (i) ; 

sminold = smin; 
smaxold = smax; 

incorporate the feasibility constraints into bounds on the 
°/ t running column sum 

smin = max(0,max(cumsums-cumconj , sminold-1) ) ; 
smax = min(smaxold, i-1) ; 

DP iteration 

SSS - 0; 
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SSCsmin+1) = 0; % no need to set S(l:smin) = 0, since it is not accessed 
for j = smin+2 : smax+2 

a = SS(j)*q; 

b = SS(j+l)*p; 

apb = a + b; 

SSS = SSS + apb; 

SS(j) = apb; 

S(j,i) = b/(apb+eps0) ; 

end 

SS(smax+3) = 0; 7. no need to set S(smax+4:end) = 0, since it is not accessed 

X check for impossible 
if SSS <= 0, break, end 

% normalize to prevent overflow/underflow 
for j = smin+2 : smax+2 

SS(j) = SS(j) / SSS; 

end 



7, check for impossible 

if SSS <= 0, logp - -inf; break, end 

'/, sampling 

j = 2; % running total (offset to match indexing offset) 
jmax = colval + 2; 

if j < jmax 7, skip assigning anything when colval == 
if doIN 

for i = l:m 

7» get the transition probability of generating a one 

p = SCj.i); 

% get the current row 

rlabel = rndx(i) ; 

if bIN(rlabel, label) 

7. if we have a generated a one, then decrement the current 

% row total 

val = r(rlabel); 

r(rlabel) = val-1; 

rcount2 = rcount2 - 2*val + 1; 
rcount2c = rcount2c - 2*val + 2; 
rcount3c = rcount3c - 3* (val-1) * (val-2) ; 

7« record the entry and update the log probability 
place = place + 1; 
logq = logq + log(p) ; 

if doW, logp = logp + logw (rlabel, label) ; end 
alist (1 , place , ALoop) = rlabel; 
alist (2 , place , ALoop) = label; 

i - J + i; 

7« the next test is not necessary, but seems more efficient 
7» since all the remaining p's must be 
if j == jmax, break, end 

else 

logq = logq + log(l-p); 

end 

end 

else 

for i = l:m 

7« get the transition probability of generating a one 
p = S(j,i); 
if rand <= p 

7. if we have a generated a one, then decrement the current row total 
rlabel = rndx(i) ; 
val = r(rlabel); 
r (rlabel) = val-1; 

rcount2 = rcount2 - 2*val + 1; 
rcount2c = rcount2c - 2*val + 2; 
rcount3c = rcount3c - 3* (val-1) * (val-2) ; 

'/, record the entry and update the log probability 
place = place + 1; 
logq = logq + log(p) ; 

if doW, logp = logp + logw (rlabel, label) ; end 
alist (1 , place , ALoop) = rlabel; 
alist (2 , place , ALoop) = label; 

j - J + i; 

7» the next test is not necessary, but seems more efficient 
'/, since all the remaining p's must be 
if j == jmax, break, end 

else 

logq = logq + log(l-p); 

end 

end 

end 
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end 



x x 

X END: SAMPLE THE NEXT "COLUMN" X 

x x 

if count == 0, break, end 



'/„ everything is updated except the sorting 

% 

jj 1 

I START: RESDRT THE NEW ROW SUMS '/„ 

% % 

°/ c re-sort the assigned rows 

'/„ this code block takes each row that was assigned to the list 
'/, and either leaves it in place or swaps it with the last row 

that matches its value; this leaves the rows sorted (descending) 
'/, since each row was decremented by only 1 

looping in reverse ensures that least rows are swapped first 
for j = place : -1 rplacestart 

°L get the row label and its new value (old value -1) 
k = alistfi, j ,ALoop) ; 
val - r(k) ; 

find its entry in the sorting index 
irndxk = irndx(k) ; 

'/„ look to see if the list is still sorted 
irndxkl = irndxk + 1 ; 

if irndxkl > m I I r(rndx( irndxkl) ) <= val 
7, no need to re-sort 
continue ; 

end 

find the first place where k can be inserted 
irndxkl = irndxkl + 1; 

while irndxkl <= m kk r(rndx(irndxkl) ) > val 
irndxkl = irndxkl + 1; 

end 

irndxkl = irndxkl - 1; 

now swap irndxk and irndxkl 
rndxkl = rndx(irndxkl) ; 
rndx(irndxk) = rndxkl; 
rndx(irndxkl) = k; 
irndx (k) = irndxkl ; 
irndx (rndxkl) = irndxk ; 

end 

x x 

X END: RESDRT THE NEW ROW SUMS X 

x x 

r (rndx(rndxl :rndxm) ) is sorted descending and has exactly those 
'/„ unassigned rows 

'/„ rndx(rndxl:rndxm) still gives the labels of those rows 
'/, rndx ( irndx (k) ) = k 

1 

'/, c(ci+i:cn) is sorted descending and has exactly those unassigned columns 
cndx(cl+l:cn) still gives the labels of those columns 

x 

°L m, n, count, ccount2, ccount2c are valid for the remaining rows, cols 



logQ(SampLoop) = logq; 
logP(SampLoop) = logp; 



% y t 

% y c 

y c % 

% END DF MAIN FUNCTION °/ c 

% % 

% % 

y c % 

'/, helper function (just to keep everything together... not for efficiency, 
since it is only called once) 

function cc = conjugate_local(c,n) 
'/„ function cc = conjugate (c ,n) 
1 

°/ t let c(:) be nonnegative integers 
cc(k) = sum(c >== k) for k = l:n 

cc = zeros(n, 1) ; 
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°/,c = rain(c ,n) 



for j = l:numel(c) 
k - c(j); 
if k n 

cc(n) = cc(n) + 1; 
elseif k >= 1 

cc(k) - cc(k) + 1; 

end 

end 

s = cc(n) ; 

for j - n-l:-l:l 

s = s + cc(j) ; 

cc(j) = s; 

end 

x 

function [a,b ,abu,k] = canonical (w, f lag, tol ,maxiter ,r , c) 

[m,n] = size(w) ; 

if nargin <6 I I isempty(c) 
c = ones(l.n) ; 
elseif size (c , 1) ~= 1 
c = c(:). '; 
end 

if nargin <5 I I isempty(r) 

r = ones(m, 1) ; 

elseif size (r , 2) ~= 1 

r = r(:)j 

end 

if nargin <4 I I isemptyCmaxiter) 
maxiter = 10"5; 

end 

if nargin <3 I I isemptyCtol) 
tol - le-8; 

end 

if nargin <2 I I isemptyCf lag) 
flag = 'sinkhorn'; 

end 

switch lower (flag) 

case ' sinkhorn' 

M = sum(w>0,l); N = sum(w>0,2); 
a = N . /sum(u,2) ; a = a/mean (a) ; 
b = M. /sum(bsxfun(@times , a, w) ,1) ; 

if tol >= 0, aO - a; bO - b; end 

k = 0; 

tolcheck = inf ; 

while k < maxiter kk tolcheck > tol 
k - k + 1; 

a = N. /sum(bsxfun(@times ,b , w) , 2) ; a = a/mean (a) ; 
b = H. /sum(bsxfun(@times , a, w) , 1) ; 

if tol >= 

tolcheck = sumfabs (a-aO) ) +sum(abs (b-bO) ) ; 

aO = a; bO - b; 
end 

end 

case ' sinkhorn-col ' 
w = f liplr(w) ; 

M = sura(w>0,l); N = cumsum(w>0,2) ; 
aa = N . /cumsum(w,2) ; 

b = M./sum(w.*aa,l) ; b = b / mean(b) ; 
a = aa( : ,n) ; 

if tol >= 0, aO - a; bO = b; end 
k - 0; 

tolcheck = inf; 

while k < maxiter kk tolcheck > tol 
k - k + 1; 

aa = N. /cumsum(bsxf un(@times ,b , w) , 2) ; 
b = H. /sura(w. *aa, 1) ; b / raean(b) ; 
a = aa( : ,n) ; 

if tol >= 

tolcheck = sumfabs (a-aO) ) +sum(abs (b-bO) ) ; 

aO = a; bO - b; 



end 
end 

w = f liplr(u) ; 
b = fliplrCb); 

case 'log* 

wO - w > 0; 

H = sura(wO,l); N = sum(w0,2); 
logw = log(w+~w0) ; 
a = exp(-sum(logw,2) ./N) ; 
b = exp(-sum(logu, 1) ./M) ; 

case 'entropy' 

wl = Cw > 0) ./max(w,epsC0)) ; 

a = sqrt (sum(ul ,2) . /sum(w ,2) ) ; a = a/mean (a) ; 

b = sqrt (sum(bsxfun(©rdivide , wl , a) , 1) . /sumCbsxf un(©times ,a, w) , 1) ) ; 
if tol >= 0, aO - a; bO - b; end 
k = 0; 

tolcheck = inf ; 

while k < maxiter kk tolcheck > tol 
k - k + 1; 

a = sqrt (sum(bsxfun(©rdivide , wl ,b) ,2) . /sumCbsxf un(©times ,b , w) , 2) ) ; a = a/mean (a) ; 
b = sqrt (sum(bsxfun(©rdivide ,wl , a) , 1) . /sumCbsxf un(©times , a, w) , 1) ) ; 

if tol >= 

tolcheck = sumfabs (a-aO) )+sum(abs Cb-bO) ) ; 

aO = a; bO - b; 
end 

end 
case '12* 
w2 = w. "2; 

a = sum(u,2) . /sum(u2 , 2) ; a = a/mean (a) ; 

b = sum(bsxfun(©times ,a, w) , 1) . /sum(bsxfun(©times , a. "2 , w2) , 1) ; 
if tol >= 0, aO - a; bO - b; end 
k - 0; 

tolcheck = inf; 

while k < maxiter kk tolcheck > tol 
k - k + 1; 

a = sum (bsxfun (©times ,b , w) , 2) . /sum(bsxfun(©times ,b. "2 , w2) , 2) ; a = a/mean (a) ; 
b = sum(bsxfun(©times ,a, w) , 1) . /sum(bsxfun(©times , a. "2 , w2) , 1) ; 

if tol >= 

tolcheck = sumfabs (a-aO) )+sum(abs (b-bO) ) ; 

aO = a; bO - b; 
end 

end 
case '12p' 

w2 = w."2; 

c = (1+w) ."3; 
a = sum(w. /c ,2) . /sum(u2 . /c , 2) ; a = a/mean (a) ; 

c = (l+bsxfun(©times , a,w) ) . "3; 
b = sum (bsxfun (©times ,a, w) . /c , 1) . /sumCbsxf un( ©times ,a."2,u2) ./c,l) ; 

if tol >= 0, aO = a; bO - b; end 

k = 0; 

tolcheck = inf; 

while k < maxiter kk tolcheck > tol 
k = k + 1; 

c - (l+a*b. *w) . "3; 

a = sum(bsxfun(©times ,b , w) . /c , 2) . /sumCbsxf un(©times ,b . "2 ,w2) . /c , 2) ; a = a/mean C a) ; 

c = (l+a*b.*w) ."3; 
b = sum (bsxfun (©times ,a, w) . /c , 1) . /sumCbsxf un( ©times ,a.~2,u2) ./c,l) ; 

if tol >= 

tolcheck = sumCabs Ca-a0) )+sumCabs Cb-bO) ) ; 

aO - a; bO - b; 
end 

end 

case 'ratio' 

wz = w > 0; 
w(~wz) = eps(O) ; 
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a = sqrt (sum(uz . /w, 2) . /sum(u, 2) ) ; a = a/mean (a) ; 
b = sqrt ( siim (wz . /(bsxfunC ©times , a, w) ) , 1) . /sum (bsxf un( ©times , a, w) , 1) ) ; 

if tol >= 0, aO - a; bO - b; end 

k = 0; 

tolcheck = inf ; 

while k < maxiter kk tolcheck > tol 
k = k + 1; 

a = sqrt(sum(uz./(bsxfunC@times,b,u)) ,2) ./sum(bsxfun(Stimes,b,w) ,2)) ; a = a/mean (a) ; 

b = sqrt (sum (wz . / (bsxf un( ©times ,a, w) ) ,1) . /sum (bsxf un(@times , a, w) , 1) ) ; 

if tol >= 

tolcheck = sumCabs (a-aO) )+sum(abs Cb-bO) ) ; 

aO - a; bO - b; 
end 

end 

case 'barvinok' 

s = log(r/n) ; 
t = log(c/m) ; 

H - w.*(exp(s)*exp(t)) ; 
M = M ./ (1+M); 

sMr = sum(M,2)-r; 
sMc = sum(M,i)-c; 

tolcheck = sumCabs CsMr) ) +sum(abs CsMc) ) ; 

alpha = .01; 

while tolcheck > tol 

s = s - alpha* sMr; 
t = t - alpha* sMc; 

M = u. * (exp(s) *exp(t) ) ; 
M = M ./ (1+M); 

sMr = sum(M,2)-r; 
sMc = sum(M,l)-c; 

tolcheck = sumfabs (sMr) )+sum(abs (sMc) ) ; 
end 

a = exp(s) ; 
b = exp(t) ; 

otherwise 

error( 'unknown flag' ) 

end 

if nargout > 2, abw = a*b.*w; end 
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